This is a test model to calculate the ambient temperatures in different parts of an urban area. As a show case and proof of concept, in the first step it uses data from use AoT network data to build a Machine Learning train/test model. AoT project is expected to grow to other cities throughouth the U.S., thereofre the model can be better generalized in a year with data available for at least 9 other urban areas.

The predicted temperatures can help Urban Planners, City Officials, healthcare providers and other related institutions to have a district level map of temperatures changes inside a city, therefore planning for the related matters, such as heatwave adaptation and mitigation plans.

The Approach

The problem can be broken down into two problems as follow:

1) A predictive model for calculating ambient temperature

A Machine Learning algorithm (ML) considering different potential features in predicting the ambient temperature. Currently, these features are considered for training and testing the model:

2) A downscaling model for daily LST values

One problem is that the available remote sensing data for LST (from LandSat) in 30x30m resolution are available in 16 days time periods. Therefore a model created based on this data is not trustworthy. On ther hand other there are daily remote sensing data that can be used for calculating LST but usually in coarser spacial reolution (MODIS in 1 km resolution). A deep learning model is suggested to be trained from timely overlapped datasets of both remote sensing and to scale down the 1 km LST into 30x30m resolution. The model will be trained and validated from the overlapped (16 days data) parts and then be used to change 1x1 km resolution daily LST values into 30x30 m values for the study period. The suggested method is to do Transfer Learning on a pre-trained deep learning model by the previous Ph.D. candidate at SDS lab, Thoms J.Vandal.

The final product will be a web-app that user (an urban planner, a healthcare provider, or anyone who is interested about ambient temperature highs in the city) can click a point insidet the city and receive predictions about the ambient temperature changes in that part, and a comparison to other places of the city. The initial model will focus to the city of Chicago, as explianed below.

Proof of Concept

Following shows the steps used for gathering the data and creating a preliminary machine learning model.

Mapping Nodes on Block Groups

Here there is a basic map showing the locations of AoT nodes in Chicago and the ACS Census Blocks with their total populations

## Warning in brewer.pal(20, "Reds"): n too large, allowed maximum for palette Reds is 9
## Returning the palette you asked for with that many colors

It’s look like the available nodes do not cover all the Census Block. To reduce the area of study, I’ll add Chicago community areas as a layer.

## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

## Getting LandSat 8 data and calculate Land Surface Temperature (LST) The Landsat is a multispectral satellite currently on the eighth of it’s series. Landsat-8 data is freely available on the USGS’s Earth Explorer website. All we need to do is sign up and find a scene that match our area of study. The notation used to catalog Landsat-8 images is called Worldwide Reference System 2 (WRS-2). The Landsat follows the same paths imaging the earth every 16 days. Each path is split into multiple rows. So, each scene has a path and a row. 16 days later, another scene will have the same path and row than the previous scene. This is the essence of the WRS-2 system.

USGS provides Shape files of these paths and rows that let us quickly visualize, interact and select the important images. After checking the shape file of Landsat 4-8, WRS-2, I found that city of Chicago is within Path 23 and row 31. ( or can use a converter to find Path and Row from latitude and longitude).

at this time I used GloVis tool to download 6 images of Landsat 8 OLI/TIRS C1 Level-1. Then to test the model and for the proof of concept I am just adding one layer (band 6) of downloaded LandSat GeoTiff Image. The following figure shows the extracted LandSat image of Band6 containing Chicago.

These are used to calculate LST values and pick the values for each 30x30m tile in the city.

## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 62343791 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  62343791 '

Data Cleaning and Exploratory Data Analysis

In this section a part (one day) of data are extracted and an R dataframe is created. The heading of the dataframe looks like below. It contains one day of data for the location of each of the previouosly mapped sensor nodes

And a summary of the dataset:

##     amb_temp            lat             lon           population  
##  Min.   :-11.100   Min.   :41.69   Min.   :-87.76   Min.   : 786  
##  1st Qu.: -8.175   1st Qu.:41.79   1st Qu.:-87.68   1st Qu.:1700  
##  Median : -5.850   Median :41.88   Median :-87.66   Median :2544  
##  Mean   : -4.679   Mean   :41.85   Mean   :-87.66   Mean   :2956  
##  3rd Qu.: -1.108   3rd Qu.:41.91   3rd Qu.:-87.63   3rd Qu.:3982  
##  Max.   :  5.600   Max.   :41.97   Max.   :-87.54   Max.   :7868  
##                                                                   
##       LST         Day_population     imprev         Land_Cover
##  Min.   :-5.600   Min.   : 676   Min.   :0.1071   10     :12  
##  1st Qu.:-0.600   1st Qu.:1887   1st Qu.:0.3879   9      :10  
##  Median : 1.400   Median :2786   Median :0.5412   4      : 8  
##  Mean   : 1.158   Mean   :3103   Mean   :0.5421   11     : 7  
##  3rd Qu.: 3.300   3rd Qu.:4144   3rd Qu.:0.7291   6      : 7  
##  Max.   :12.200   Max.   :7796   Max.   :0.9254   7      : 7  
##                                                   (Other):23

visual comaprisons of data.

Boxplots:

Histograms:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Correlation plot for the variables

Training the models and comparison

As it is a sample of a regression problem, the follwing models are consider for creating the predictive models.

  • Linear Regression (LR)
  • Generalized Linear Model (GLM)
  • Support Vector Machine (SVM)
  • Classification and Regression Tree (CART)
  • As an ensemble model a Random Forest (RF) model is also considered and added.

The data was randomly assigned into 80% trainng and 20% validation sets. Then all data were normalized before running the models, and a 10-fold cross validation method was used for fitting the models. The following diagram shows the evaluation of each of the methods based on three evaluation metrics, Mean Absolute Error (MAE), Root of Mean Squared Error (RMSE), and Rsquared.

This is just a showcase of one part of the suggested model, and the used data may not be accurate at this step.